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Abstract. Preconditioned gradient iterations for very large eigenvalue problems are efficient solvers 
with growing popularity. However, only for the simplest preconditioned eigensolver, namely the precondi- 
tioned gradient iteration (or preconditioned inverse iteration) with fixed step size, sharp non-asymptotic 
convergence estimates are known and these estimates require an ideally scaled preconditioner. In this 
paper a new sharp convergence estimate is derived for the preconditioned steepest descent iteration which 
combines the preconditioned gradient iteration with the Rayleigh-Ritz procedure for optimal line search 
convergence acceleration. The new estimate always improves that of the fixed step size iteration. The 
practical importance of this new estimate is that arbitrarily scaled preconditioners can be used. The 
Rayleigh-Ritz procedure implicitly computes the optimal scaling. 

Key words, eigenvalue computation; Rayleigh quotient; gradient iteration; steepest descent; pre- 
conditioner. 



1. Introduction. The topic of this paper is a convergence analysis of a precondi- 
tioned gradient iteration with optimal step-length scaling in order to compute the smallest 
eigenvalue of the generalized eigenvalue problem 

(1.1) Ax, = XiBxi 

for symmetric positive definite matrices A, B £ M. nxn . A typical source of (|1.1[) is an 
eigenproblem for a self-adjoint and elliptic partial differential operator whose weak form 
reads 



(1.2) a(u,v) = X(u,v), V«6ff(JJ). 

The bilinear form a(-, •) is associated with the partial differential operator and an L 2 (il) 
inner product (•,•) appears on the right side. Further u is an eigenfunction and A an 
eigenvalue if (|1.2|) is satisfied for all v in an appropriate Hilbert space H(fl). A finite 
element discretization of (|1.2p results in (jl.l[) . Then A is called the discretization matrix 
and B the mass matrix. These matrices are typically sparse and very large. 

The eigenvalues of (|1.1[) are enumerated in increasing order < Ai < A2 < . . . < A„. 
The smallest eigenvalue Ai and an associated eigenvector can be computed by means of 
an iterative minimization of the Rayleigh quotient 

< L3) «*> - feSy- 

where (•, •) denotes the Euclidean inner product. To this end the simplest preconditioned 
gradient iteration corrects a current iterate x in the direction of the negative precondi- 
tioned gradient of the Rayleigh quotient to form the next iterate x' 

(1.4) x 1 = x - T(Ax - p(x)Bx). 



Therein T is a symmetric positive definite matrix and is called the preconditioner. This 
fixed-step-length preconditioned iteration is analyzed in (2j EJ [8] ; see also the references 

Appropriate preconditioners T are available in various ways; especially for the operator 
eigenproblem (jl.2l) multi-grid or multi-level preconditioners are available. In this context 
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the quality of the preconditioner is typically controlled in terms of a real parameter 7 € 
[0,1) in a way that 

(1.5) (l- 7 )(z,T- 1 z) < (z,Az) < (l+ 7 )(z,T- 1 z), VzeR", 

or equivalently, that the spectral radius of the error propagation matrix I—TA is bounded 
by 7. 

The following result for the convergence of (|1.4[) is known from [BJ |S] ; the convergence 
analysis interprets this preconditioned iteration as a preconditioned inverse iteration and 
makes use of the underlying geometry. 

Theorem 1.1. If A, < p(x) < Ai+i i/ien /or x' given by \1.J$ and assuming U.5\) it 
holds that p(x') < p{x) and either p(x') < A, or 

/-, «n P(x') - A, 2 p(x) - Aj A 4 



Xi+i - p(x') A{+i - /o(x) ' 1 A J+ i ' 

Thm. 11.11 is up to now the only known sharp estimate for this and various improved 
and faster converging preconditioned gradient type eigensolvers. The most popular of 
these improved solvers are the preconditioned steepest descent iteration (PSD) and the 
locally optimal preconditioned conjugate gradients (LOPCG) iteration (and also their 
block variants) [5]. All these eigensolvers apply the Rayleigh-Ritz procedure to proper 
subspaces of iterates for convergence acceleration, see [7] . A systematic hierarchy of these 
preconditioned gradient iterations and their variants for exact inverse preconditioning 
(which amounts to certain Invert-Lanczos processes [15]) has been suggested in p~3]. The 
aim of this paper is to prove a new sharp convergence estimate for the preconditioned 
steepest descent iteration (PSD). 

1.1. Assumptions on the preconditioner. A drawback of Thm. fOl is its assump- 
tion (|1.5[) on the preconditioner T. The existence of constants 1 ± 7 with 7 < 1 is not 
guaranteed for arbitrary (multigrid) preconditioners, but can always be ensured after a 
proper scaling of the preconditioner. To make this clear, take an arbitrary pair of sym- 
metric positive definite matrices A,T £ R nx ™. Then constants 71, 72 > exist, so that 
the spectral equivalence 

(1.7) -f 1 (z,T- 1 z)<(z,Az)<-y 2 (z,T- 1 z), Vzel" 

holds. If a preconditioner T satisfies (II. 7[) . then the scaled preconditioner (2/(71 + 72))^ 
fulfills (H} with 

(!-8) 7= — - — ■ 

71 + 72 

A clear benefit of the preconditioned steepest descent iteration is, that by computing the 
optimal step length parameter $ op t, see Eq. (|1.9I) . the scaling parameter 2/ (71 +72) is 
determined implicitly. Therefore, we can use the assumption (11.71) or alternatively the 
more convenient form (|1.5p . This guarantees the practical applicability of the precondi- 
tioned steepest descent iteration for any preconditioner satisfying (|1.7j) or in its scaled 
form satisfying ()1.5[) . 



1.2. The optimal-step-length iteration: Preconditioned steepest descent. 

A disadvantage of the gradient iteration (jl.4l) is its fixed step length resulting in a non- 
optimal new iterate x' . An obvious improvement is to compute x' as the minimizer of the 
Rayleigh quotient (|1.3p in the affine space {x — dT(Ax — p(x)Bx); ■§ 6 R}. That means 
we consider the optimally scaled iteration 

(1.9) x' = x - $ opt T(Ax - p(x)Bx) 
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with the optimal step length 

$opt = argminp(a; — dT(Ax — p(x)Bx)) 

is considered. This iteration is called the preconditioned steepest descent iteration (PSD), 
[U[7l[TH]. Computationally one gets x' and its Rayleigh quotient p(x') by the Raylcigh- 
Ritz procedure. If T(Ax — p(x)Bx) is not an eigenvector then (x',p(x')) is a Ritz pair 
of (A, B) with respect to the column space of [x,T(Ax — p(x)Bx)}. As (|1.9[) aims at 
a minimization of the Rayleigh quotient, p(x') is the smaller Ritz value and x 1 is an 
associated Ritz vector. The Rayleigh-Ritz procedure computes the optimal step length 
implicitly; the step length is determined by the components of the associated eigenvector 
of Rayleigh-Ritz projection matrices. Consequently the preconditioned steepest descent 
iteration converges faster than the fixed-step-length scheme (| 1 .4[) since 

(1.10) p(x - tf opt T{Ax - p{x)Bx)) < p(x - T(Ax - p{x)Bx)). 

Therefore Thm. 11.11 serves as a trivial upper estimate for the accelerated iteration (jl.91) . 
The aim of this paper is to prove the following sharp convergence estimate for (|1.9j) . 

Theorem 1.2. Let x e M™ and x' be the PSD iterate given by fl.9\) . The pre- 
conditioner T is assumed to satisfy {1. 7| ). // A; < p(x) < Aj+i, i = l,...,ra — 1, then 
p{x') < p( x ) and either p(x') < Xi or 

P(x') - Aj < ^ p{x) - A, 
Ai+i - P(x') ~ A i+ i - p(x) ' 

k + 7(2-k) Ai(A n -A i+ i) 

with a = — r , K = — — 

{2-K) + jK Ai+i(A n - Xi) 

and 7 := (72 — 71)/ (71 +72). //71 = 1 — 7 and 72 = 1 + 7 as in U.5)) . then (72 — 72)/ (71 + 
72) = 7- The estimate is sharp and can be attained for p(x) Xi in the 3D invariant 
subspace associated with the eigenvalues Xi, Ai+i and A„, i + l^n. 

The limit case 7 — of Thm. 11.21 is an estimate for the convergence of the steepest 
descent iteration which minimizes the Rayleigh quotient in the space spanjx, A~ 1 Bx}. 
Then the convergence estimate (jl.lip reads 

p(x') - Xj ( k \ 2 p(x) - A. t 
A i+ i - p(x') ~ \2 - kJ X l+1 - p(x) 

with k given by (jl.lip . A proof of this result (in the general setup of steepest ascent 
and steepest descent for A and A -1 ) has recently been given in [15] : for the smallest 
eigenvalue (that is for i = 1) the estimate was proved in [9]. This paper generalizes this 
result on steepest decent for A~ X M to the preconditioned variant of this iteration. For 
the following analysis we always assume a properly scaled preconditioner satisfying (II. 5[) . 
If T fulfills (I2.3[) we use (2/(71 + J2))T (and call the scaled preconditioner once again T) 
so that 7 is given by (| 1 .8[) and (jl.5l) is fulfilled. This substitution does not restrict the 
generality of the approach since the scaling constant is implicitly computed with i? opt in 
the Rayleigh-Ritz procedure. We prefer to work with (|1.5j) since this allows to set up the 
proper geometry for the following proof. 

Only few convergence estimates for PSD have been published. Of major importance 
are the work of Samokish [19] . the results of Knyazev given in Thm. 3.3 together with 
Eq. (3.3) in [1] and further the results of Ovtchinnikov [18] . Knyazev uses similar assump- 
tions and applies Chebyshev polynomials to derive the convergence estimate. Ovtchin- 
nikov in [18] derives an asymptotic convergence factor which represents the average error 
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reduction per iteration; further non-asymptotic estimates are proved under specific as- 
sumptions on the preconditioner. The result of Samokish (only available in Russian) is 
reproduced in a finite-dimensional non-asymptotic form as Thm. 2.1 in [18 ; see also Cor. 
6.4 and the following paragraph in |18j for a critical discussion and comparison of these 
estimates. Due to different assumptions and a different form of the convergence esti- 
mates these results are not easy to compare with (|l.llj) ; an important difference is that 
in Thm. [L~2"l the restrictive assumption p(x) < A2 is not needed. 

1.3. Overview. This paper is organized as follows. In Sec. 2 the geometry of PSD is 
introduced. Further the problem is reformulated in terms of reciprocals of the eigenvalues 
which makes the geometry of PSD accessible within the Euclidean space. Sec. 3 gives a 
proof that PSD attains its poorest convergence in a three-dimensional invariant subspace of 
the R". Sec. 4 contains a mini-dimensional analysis of PSD. Finally the three-dimensional 
convergence estimates are embedded into the full R" which completes the convergence 
analysis. 

2. The geometry of the preconditioned steepest descent iteration. For the 

analysis of the preconditioned steepest descent iteration it is convenient to work with the 
linear pencil B — pA (instead of A — XB) . The advantage is that the A- norm by a proper 
basis transformation turns into the Euclidean norm, see below. A further benefit of this 
representation is that a generalization to a symmetric positive semidcfinitc or even only 
a symmetric B is possible (cf. the analysis of (|1.4I) in [8 ). Hence for the pencil B — pA 
the eigenvalues p,i are given by 

Bx.i — [LiAxi with [ii — 1/Ai, i — 1, . . . ,n. 

Therefore the problem is to compute the largest eigenvalue pi\ by maximizing the inverse 
of the Rayleigh quotient (|1.3|) 

(2 - 1} := Ji~Axj = WY 

Lemma 2.1. Without loss of generality we can assume that A = I and that B = 
diag(/ii, . . . , /i„) with simple eigenvalues Hi > fj,2 > ■ ■ ■ > fJ-n > 0. This transforms \1.9\) 
(after multiplication with /J>(x) = 1/ p{x) and by denoting the transformed preconditioner 
again by T) in the form 

(2.2) fi(x)x' — p(x)x + ■d opt T(Bx — p(x)x) 
with the optimal step length 

i^opt = argmax/i(/i(x)a; + dT(Bx — fi{x)x)). 

The quality constraint hl.5\) on the preconditioner T € R" x ™ turns into a bound for the 
spectral norm \\ ■ \\ of the symmetric matrix I — T which reads 

(2.3) ||/-T||< 7 . 



Proof. The generalized eigenvalue problem (|l.lj) is first transformed into a standard 
eigenvalue problem C _1 i?C _T 2/ = [iy using the Cholesky factorization A — CC T , y = 
C T x and /1 = 1/A. The symmetric matrix C~ 1 BC~ T can be diagonalized by means of 
an orthogonal similarity transformation. Then all transformations are applied to (jl.9l) . 
For convenience we denote the transformed system matrix by B. Further the transformed 
preconditioner is denoted, once again, by T, since (| 1 . 5|) still holds with A = I. All this 
results in (|2~2j) and (|23]) . 
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To show that the proof of Thm. 11.21 can be restricted to the simple eigenvalue case 
we apply the same continuity argument which has been used in Theorem 2.1 in [8]. The 
argument is based on a perturbation B e of B having only simple eigenvalues. Then 
the perturbation e is reduced to 0. The continuous dependence of x' and fJ.(x') on the 
perturbation completes the proof. This reasoning can be transferred to PSD since the 
Rayleigh-Ritz procedure preserves the continuity of the eigenvalue approximations. □ 

Next the reformulation of Thm. 11.21 in terms of the /(-notation is stated. 

Theorem 2.2. // < fJ-{x) < \ii then fi(x') > and either fJ,(x') > fii or 

m - M<y ) < 2 \h - 

fj,(x') - LH+i ~ fj,(x) - fj, i+1 ' 

(2.4) 

... K + 7(2 - k) fj, i+1 - Hn 

with a — ana k = . 

(2 - K) + 7K [Li - fl n 

The estimate is sharp and can be attained for /i(x) — > \ii in the 3D invariant subspace 
associated with the eigenvalues [li, fii+i and ji n , i + I =/= n. 

2.1. The cone of PSD iterates. The starting point of the geometric description 
of PSD is the non-scaled preconditioned gradient iteration (|1 .4[) whose /^-representation 
reads 

(2.5) fi(x)x' = fi(x)x + T(Bx — n(x)x) = Bx — (I — T)(Bx — /i(x)x). 

A central idea of its convergence analysis in [ITJ[12l|6] is to treat the preconditioners on the 
whole. This means that all admissible preconditioners satisfying the spectral equivalence 
(|2.3p are inserted to (|2.5p with x being fixed. This results in a set B 1 (x) of all possible 
iterates 

(2.6) B 7 (x) := {Bx -(I- T){Bx - (m(x)x); T s.p.d. with \\I - T\\ < 7}. 

The set B 1 (x) is a full ball with the center Bx and the radius 7||-Bcc — /i(x)x\\. The 
subject of the convergence analysis of (|2.5I) in [11] [12] is to localize a vector of poorest 
convergence (i.e. with the smallest Rayleigh quotient) in B 1 {x) and to derive an estimate 
for its Rayleigh quotient. 

In contrast to (|2.5p the PSD iteration (|2.2[) works with an optimal step length pa- 
rameter i? op t in order to maximize the Rayleigh quotient in the one-dimensional affine 
space 

(2.7) n{x)x + $T{Bx- n{x)x), i?6l. 
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The union of all these affine spaces for all the preconditioners satisfying (|2.3[) is the smallest 
circular cone with its vertex in ji{x)x which encloses Z? 7 (x). This cone is denoted by .F 7 (x), 
see Fig. 12.11 and it holds that 

F 7 (x) := {fi{x)x + #(y - n(x)x); y € S 7 (x); 6 e K} 
' ' = {im(x)x + dd; \\Bx- (p{x)x + d)\\ < -y\\Bx - (j,(x) x||; i? € R}. 

2.2. The geometric convergence analysis as a two-level optimization. The 

geometric convergence analysis of preconditioned steepest descent consists of estimating 
the poorest convergence behavior. Therefore a two-level optimization problem is to be 
solved. On the one hand one has to determine this affine space (|2.7[) in the cone T 7 (x) 
in which the maximum of the Rayleigh quotient (i.e. the largest Ritz value in this space) 
takes its smallest value; this vector is associated with the poorest convergence due to the 
choice of the preconditioner. On the other hand the cone .F 7 (x) depends on x; hence one 
can analyze the dependence of this vector of poorest convergence on all vectors in the R" 
having the same Rayleigh quotient as x. This amounts to considering the level set of the 
Rayleigh quotient of vectors having a fixed Rayleigh quotient fj,o, i.e. 

£(/x ) := {x e M"; fi(x) = Mo}- 

Let x* £ £(^o) be the minimizer representing the poorest convergence and let d* G 
J r 1 (x) — /j,(x)x be the search direction of poorest convergence. So the two-level optimization 
is 

fj, := min min n(no x + $ pt[x, d]d). 

i££(fio) d£J r -,{x) — fiox 

Therein ^(x)x + $ op t [x, d]d is the Ritz vector which is associated with the larger Ritz value 
/i(x + $ pt[x, d]d) in span{x, d}. The factor $ pt = i?opt[x, d] depends on x and d. The 
minimum /i is now to be estimated from below. 

3. The level set optimization - a reduction to 3D. The aim of this section is to 
show that the poorest convergence of PSD with respect to the admissible preconditioners 
and with respect to all vectors x £ C(fio) is attained in a three-dimensional B- invariant 
subspace of the R™. 

The representation (|2.7I) of the PSD iteration applies the line search to d E T 1 (x) — 
/i(x)x. This may result in an unbounded step length. To see this let d = e\ = (1, 0, . . . , 0) T 
which is an eigenvector of B. If 7 is close to 1, then e,\ € J r 7 (x)—/i(x)x can be attained since 
lim 7 _>i T^x) = R n . The unboundedness is a consequence of lim^-too /z(/x(x)x + $ei) = 
fi\. The potential unboundedness of the step length has already been pointed out by 
Knyazev [ID] , 

Next we want to avoid this singularity. Therefore let x 1 = dx + d. Due to /it(x') > 
/Lt(x) (which is guaranteed by Thin. II. 1[) d is bounded. So the minimization problem is 
reformulated as 

(3.1) jU := min min /x(-# pt[x, d]x + d). 

In the next theorem a necessary condition characterizing this minimum is derived by 
means of the Kuhn- Tucker conditions |17) . The application of the Kuhn- Tucker conditions 
in the context of the convergence analysis of the fixed-step size preconditioned gradient 
iteration has been suggested by R. Argentati, see Q]. 

Theorem 3.1. The minimum \3. 1\) is attained in a three-dimensional B-invariant 
subspace of the R" . 
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If PSD does not terminate in an eigenvector, then the associated Ritz vector w of 
poorest convergence is also contained in the same three-dimensional B -invariant subspace 
of the 1", i.e. 

(B + a)w = c(B + b)x 

with a, b, c £ K and B + a being a regular matrix. 

Proof. The minimization problem (|3.1|) reads as follows: 

Minimize 

M^optz + d) 

with respect to x, d £ 1" satisfying the two constraints: 

1. The cone inequality constraint d £ T^ix) — (Iqx 

g(x, d) = \\Bx — (no% + d)\\ 2 — ^ 2 \\Bx — hqx\\ 2 

= (1 - j 2 )\\Bx - fi x\\ 2 - 2(Bx - fi x, d) + \\d\\ 2 < 0. 

2. The level set constraint x £ £(/io) 

h(x, d) = (x, Bx) — Ho(x, x) = 0. 

Therein $ opt = $ op t[x, d] £ M is a functional depending on x and d which maximizes 
the Rayleigh quotient in the two-dimensional subspace spanjx, d}. Equivalently w := 
v^optx+d is a Ritz vector corresponding to the larger Ritz value in just this two-dimensional 
subspace. The first constraint guarantees that d is an admissible search direction, i.e. the 
distance of n$x+d to the center Bx of the ball B 1 {x) is bounded by its radius j\\Bx— /j, x\\. 
The Karush-Kuhn- Tucker stationarity condition for a local minimizer (x*,d*) reads 

V {x4) K$o P tx* + d*) + aV {x4) g(x* ,d*) + (3V {Xid) h(x* ,d*) = 

with the multipliers a and j3. In order to simplify the notation, the asterisks are omitted 
from now on. 

Next we derive the gradients of these functions /i, g and h with respect to x and d. 
The chain rule gives (for column vectors) 

+ d)) = (D x (# opt x + d) f (V^ opt x + d). 

It holds that 

{D x (>3 opt x + d)) i3 . = (a;(V s tf opt ) T + J? opt /)y. 
With w := floptx + d we get 

V x ^(tioptx + dfj = tf opt (V/i)0) + (V x i? opt ) 0, (V/i)(») 

2 

= t? op t(V^)(iy) = tfopt-. r(Bw - /i(w)w). 

[w,w) 

Therein, (x, (V/i)(«;)) = has been used which holds since (V/x)(u>) is collinear to the 
residual of the Ritz vector and further, by definition of a Ritz vector, its residual is 
orthogonal to the approximating subspace spanja;, d}. For the d-gradient it holds that 

V d (V($ op t2; + d)) = (V^)(w) = - 2 (Bw - (i(w)w). 
V / (w,w) 

The gradients of the constraining functions g and h with r = Bx — (IqX are 

V x g(x, d) = (1 - j 2 )2(B - no)r - 2(B - ^)d, V x h(x, d) = 2r, 
V d g{x, d) = -2(B - fi )x + 2d = 2(d - r), V d h(x, d) = 0. 
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Hence the x-components of the Karush-Kuhn- Tucker stationarity condition are 

(3.2) -^-(B - fx{w))w +a\{l- 7 2 )(B - Mo ) 2 z - {B - fi )(w - v> opt x)\ + [3r = 
(w, w) I J 

and the d-components read (Bw — /i(w)u>) + a(w,w)(d — r) — 0. The equation for the 
ci-components can be reformulated as 



(3.3) 



(B + a)w = a(w, w){B + b)x 



with a — a(w,w) — /J-(w) and b = $ opt — /io- Multiplication of (I3.2[) with B + a and 
insertion of (13.31) results in 



a{(l - 7 2 )(B - fi Q f(B + a)x - (B - fi ) [a(w, w)(B + b)x - tf opt (B + a)x] } 
+ a^ opt (B - n(w))(B + b)x + P{B + a)(B - ^ )x = 0. 

This can be expressed as 



(3-4) 



p 3 (B)x = 



with a third order polynomial p 3 . Due to the basis assumptions B is a diagonal matrix 
and so P3(B) is diagonal. As p$ has at most three different zeros, Q3.4I) can only hold if x 
has at most three non-zero components, which proves the first assertion. 

Hence x € spanje^, e^, e{\ for proper indexes j, k and I. For that x Eq. (|3.3[) shows 
that w has not more than four non-zero components; four non-zero components are only 
possible if a = —fi s for s ^ j,k,l. Then (13.21) can be written as pi(B)w — p2{B)x <E 
spanjej, efc, e;} with a first order polynomial p\ and a second order polynomial p2- The 
latter equation implies that pi(fi s ) = p\(—a) = 0. The s-th component of the polynomial 
identity results in a = (a(J,o(w, w) — /tt(u;)$ pt)/($opt — a(w, w)). Together with the known 
form a — a{w, w) — (J.(w) we get by direct computation that a = b. Insertion of this result 
to (|3.3p shows that w = a(w,w)x + Ce s for a real constant C. Then x _L e s and x and 
e s are the Ritz vectors. PSD terminates in e s and w with not more than three non-zero 
components is the normal case. □ 

4. The cone optimization - a mini-dimensional geometric analysis. Next the 
convergence behavior with respect to the cone !F 1 {x) is analyzed. Some of the following 
arguments are valid in the W 1 ; however we need these properties only for n = 3. 

The (half) opening angle ip of the cone ^{x) is given by simp — 7, since 7 is the 
ratio of the radius 7H-B2! — /x(x)cc|| of the ball S 7 (x), see (|2.6p . and its (maximal) radius 
|| Bx — n(x)x\\ for 7 — ► 1. With cos^ = y/l — 7 2 the cone .F 7 (x) can be written as 



J" 7 (x) :=/i(i)i + {z6t"; ( 



z Bx — fi(x)x 
\\z\\ ' \\Bx — fi(x)x\\ 



4.1. Restriction to non-negative vectors. The analysis of PSD can be restricted 
to component-wise non- negative vectors x £ R™. The justification is as follows. Consider 
the Householder reflections Hi = I — 2eieJ for which x i-> HiX changes the sign of the ith 
component of x. The Rayleigh quotient is invariant under Hi, i.e. fi(x) = fi(Hix). If v is 
an admissible search direction, i.e. v € J-" 7 (x) — fj,(x)x, then 



cos /C(v, Bx — fi(x)x) = ( 



Bx — jj,(x)x 



) = ( 



HiV BHiX — fi(Hix)HiX 



v\\ \\Bx ~ n(x)x\\' y \\Hiv\ i 
cos /-{HiV, BHix — fi(Hix)Hix), 



\BHiX — fi(Hix)Hix\ 



) 
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which means that HiV encloses the same angle with the residual vector associated with 
HiX. As for all a g K 

n([i(Hix)HiX + aHiv) — /j,(Hi(/j,(x)x + av)) — + av) 

any Rayleigh quotient in the cone J-" 7 (ir) can be reproduced in the cone J-y(Hix) and vice 
versa. Thus the analysis can be restricted to x > 0. 

4.2. The poorest convergence in the three-dimensional cone F 1 (x). Any cir- 
cular cross section (with non-zero radius) of J--y(x) can serve to represent the admissible 
search directions, see Fig. 12.21 Next we work with the disc 

(4.1) S$(x) := n{x)x + (1 - 7 > + {fy; y g R 3 , || v || <l,y±r} 
with r := Bx — fi(x)x. Its radius /, see Fig. 14. 1\ is given by 

(4.2) / = 7Vl-7 2 W- 

Further we use only search directions d £ S°(x) — fi(x)x which are orthogonalized against 
x; this is justified since the Rayleigh-Ritz approximations (and so the PSD iterate x') only 
depend on the subspace. So the set of relevant search directions forms a line segment. By 
using the vector v — x x r/\\x x r\\ = x x r/(||a;|| ||r||) one can construct the intersection 
of this line segment with the surface of the cone. The points of intersection are di/ 2 with 

(4.3) di = n(x)x + (1 - 7 2 )r + jy/l - 7 2 ||r||w, 

(4.4) d 2 = fi(x)x + (1 - 7 2 )r - jy/l - 7 2 \\r\\v, 

x x r 

IRTW' 

Therefore the line segment has the form (see Fig. I4.2|) 

(4.5) S 7 (x) := {d(t) := tdi + (1 - t)d 2 ; t g [0, 1]}. 

Lemma 4.1. The poorest convergence of PSD in 3D (aside from the singular cases 
that PSD terminates in an eigenvector) is attained in d\ or d 2 as given by {4.3}) and {4-4]) - 

Proof. The line segment 5 7 has the form d(t) with t g [0,1] by (|43i The PSD 
iteration maps into a curve w(t), t g [0,1], where w(t) is the Ritz vector w(t) = 
n{x)x + &a P t(t)d(t) corresponding to the larger Ritz value in span{ir, d(t)}. (A singularity 
like that mentioned at the beginning of Sec. |3] has not to be considered since otherwise 
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the first alternative /x(x') > fi{ in Thm. 12.21 applies and nothing is to be proved.) Along 
w(t) we are looking for a vector w* — w(t*) so that 

fi(w(t*)) < fi(w(t)) VtG[0,l]. 

Since w(t) is a Ritz vector its residual Bw(t) — /i(w(t))w(t) is orthogonal to the subspace 
spanned by x and d(t). As the residual is collinear to the gradient vector V/i(w(t)) we get 

(4.6) (V(i(w(t)), span{x, d(t)}) = 0. 

A stationary point of the Rayleigh quotient in a t £ (0, 1) is attained if 

o = | M M*)) = (VMM*)W(t)) 

= (VMM*))X P t(«) + tfopt(*K(i)) 
= (Vv(w(t)),# opt (t)d'(t)) 

where (|4.6[) has been used for the last identity. As d'(i) is collinear to x x r we get from 
(VfJ.(w(t)), d'(t)) = together with that Vju(io) = (since x, d and d' span the Mr). 
So any interior stationary point must be an eigenvector and hence fi(w(t)) take the other 
extrema on the surface for t = or t = 1 in d\ or o?2- d 

Next we apply the Rayleigh- Ritz procedure to the two-dimensional subspaces [x, di — 
n(x)x], i = 1,2, in order to determine whether the poorest convergence is attained in d\ 
or d2- First the Euclidean norm of di — ji(x)x is determined 



||4 - »(x)xf =(1 - 7 2 ) 2 (r, r) ± (1 - 7 2 Wl - 7 2 (r, x x r)/||x|| 
+ 7 2 (l- 7 2 )||.TXr|| 2 /||x|| 2 
=(l- 7 2 ) 2 ||r|| 2 + 7 2 (l- 7 2 )||r|| 2 = (l- 7 2 )||r|| 2 . 

Hence the normalized search directions (di — (i(x)x)/\\di — /u(x)x|| are 

- d 1/2 - n(x)x f r xxr 

dl/2: = v/i-7 2 iMI ^H ±7 M 

and therefore V\ = [x, di] and V% — [x, d2] € R 3x2 are orthonormal matrices. The Ritz 
values of B in the column space of Vi are the eigenvalues of the projection 

B i :=WBV i =( - Kx) {d *' Bx) 



(di,Bx) n(di) 
The larger Ritz value (that is the larger eigenvalue of Bi) reads 



li{x) + n{<k) . I Gu(x) - n(di)) 



\2 



(4.7) Q %i = ^ 'f^ + y ^ + (4, Bxf. 

In order to decide whether in di or in di poorest convergence is taken, we show that the 
non-diagonal elements of Bi do not depend on i since 



(4.8) (d u Bx) = (di,Bx-fi(x)x) = \\r\\(d h — ) = ||r|| cos l(d i} r) = y/l-^\\r\\. 
Hence only the (2,2) element of Bi depends on i. As further 
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shows that f? 2j i is a monotone increasing function of fj,(di) we still have to find the di with 
the smaller Rayleigh quotient in order to find the search direction which is associated with 
the poorer PSD convergence. 

Lemma 4.2. PSD in 3D takes its poorest convergence, i.e. the smallest value of 62, 



(4.9) d = fi(x)x + (1 - 7 2 )r + 7 V 1 - 7 2 



x x r 



if x *E K™ is a component-wise non-negative vector (cf. Sec. \4-l\ l- The associated Ritz 
value is accessible from |^.7| ). 

Proof. We show that O2. 1 is the smaller Ritz value by showing (we use the monotonicity 
of 02,i\p(di)]) that n(d~i) < Ufa). This inequality is true if (r,B(x x r)) < 0. By using 
spanjx, r} 1 x x r and r _L x direct computation results in 

(r, B(x x r)) = (B(Bx — (j,(x)x), x x r) = (B 2 x, x x r) — fj,(x)(Bx, x x r) 
= (B 2 x, x x r) — /i(x)(r + fi(x)x, x x r) 
= (B 2 x, x x r) = (r, B 2 x x x) = (Bx, B 2 x x x) 
= -X 1 X2X 3 ((ll - /i2)(Ml - M3)(M2 - M3) < 0. 

The last inequality holds since x > and /ii > /i2 > M3- D 

4.3. A mini-dimensional convergence analysis of PSD. Due to Thm. |3~T1 the 
"mini-dimensional" convergence analysis can be restricted to three-dimensional i?-invari- 
ant subspaces of the R". With respect to the basis of eigenvectors these subspaces have 
the form spanjej, e^, e;} where e* is the *-th unit vector. The associated eigenvalues are 
indexed so that pij > /ifc > /z;. 

Lemma T4.2I delivers for any x £ £(m) in 3D the vector of Z? 7 (ir)-poorest PSD conver- 
gence. Next we have to analyze the £ (/independence of the poorest convergence case. 

Theorem 4.3. In the three-dimensional space spanje.,-, e&, e{\ the following sharp 
estimate for PSD holds 

Aj.fc^Q < / k + 7 (2-^) x 2 
Aj, fe (/i) ~~ \(2 - k) + jk 

with 

A,-.fe(4) = and k = . 

S - Mfc Mi - Mi 
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Proof. The starting point of the following analysis are the vectors x and 

;ixr 



d = (i(x)x + (1 — 7 2 )r + jyl — 7 2 



F|| 

Without loss of generality x can be normalized in a way that 

a; = ej + a e k + (3 ei; 

hence x is an element of the afhne space Ej := ej + spanjefc, ei}. The coordinate form of x 
in 3D then is x — (l,ao,/3o) T - Further let d = (l,ce,(3) T € Ej the corresponding multiple 
of d. Since spanja;, d} is a tangential plane of the ball B 1 (x) in d and — d is a radius 
vector of the ball it holds that 

(4.10) Bx — d _L spanja;, d} = spanja;, d}. 

Hence Bx — d is collinear to 

ixii = (a /3 ~ a/3 a , /3 - /3,a- a ) T . 

By Si = (1, Cfc, 0) T and S*2 = (1, 0, c{) T with Si, £2 € Ej we denote the points of intersec- 
tion of spanjcc, d} with ej + spanjefc} and ej + span{e;}, see Fig. 14.31 Due to (|4.10j) it 
holds that (Bx — d, Si) = 0, i = 1, 2. Since 



Ba; — d = 7 2 r — 7\/ 1 — 7' 



■ x x r 



we get with 



(/it - ^)a , a; x r = /3 (/Xj - 
(fit-n)0o J \ a (nk-Hj) 



from (Bx -d,Si) = that 
(4.11) c fc = 



(Bx 


~d)h \\x\ 


\(h - 


/z) +Ta>o0o(Vk 


- Mi) 


(Bx 


-d)\ 2 \\x\ 


\a (p 


- (ik) + r/3 (fi>j 


- w) 


i = 


results in 








(Bx 


~d)\i IN 


\(n - 


fx) + Ta Q l3 Q (fi k 


- w) 


(Bx 


~d)\z \\x\ 




- fii) + Ta a (Li k 


-Mi) 



(4.12) q = 

with r = y/T— 7 2 /7- 

Any lefjfl £(m) is an element of the ellipse (x k /a) 2 + (xi/b) 2 — 1 with 



Mj ~ M 6= /Mj-M 



M - Mfc V M _ Mi 

As justified in Sec. 14.11 the analysis can be restricted to componentwise non-negative x = 
(1, cto, Po) T so that its components cto and (3$ can be represented in terms of tp £ (0, 7r/2) 
and t — tan ip 



(4.13) o„ = r/coslr) = /3 = bsnu» = b\J - ^ 2 . 
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Two further ellipses in £j are relevant for the subsequent analysis. These ellipses are very 
similar, each centered in e„- (the origin of £j) and each tangential to the line through Si 
and 5*2. The first ellipse is £j n C(jj,') with /i' = /x(x') and has the semi-axes 



This ellipse is tangential to the line through S\ and S2 since /i(cc') is associated with the 
poorest convergence on the cone J- 7 (x) projected to £j. Direct computation shows that 
a'/b'<a/b. 

The second ellipse E, see Fig. 14.41 has the semi-axes a and b so that the ratio of its 
semi-axes equals that of £j CiC(fi). This means that a/b = a/b. It holds that a > a', since 
otherwise a contradiction can be derived. Assuming a < a' for any point (a, /3) on the 
ellipse E it holds that (by using a! lb' < a/b) 

n' 2 n 2 n 2 

2 1 u o2 „ 2 , u o2 2 , u o2 ~2 , 12 

a + — /? <a +— (3 =a +^(3 =a <a 

so that a 2 /a' 2 + 1 jb' 2 < 1. The latter inequality means that the ellipse E is completely 
surrounded by the ellipse £(/i')n£j, which contradicts its tangentiality to the line through 
Si and 5*2. Hence 

A U') = = a ' 2 < ~ a 2 

and an upper limit for a 2 /A(/i) = a 2 /a 2 remains to be determined. Next we show that 
(the case q — > oo is to be treated separately by analyzing the limits of Cfc and q) 



(4.14) 



b 2 c\ + a 2 cf ' 



To prove this we determine the point of contact of the line through Si and S2 and the 
ellipse E. The semi-axes of E are a and b — ba/a. By a rescaling of the second semi-axis 
with the factor a/b the ellipse becomes a circle with the radius a and the point of contact 
does not change. Further the line segment connecting Si and S2 is transformed 

*M = (£,)+ «M. 

The point of contact is that point on s(<r) with the smallest Euclidean norm. From 

|| S (a)|| 2 = a^ + (^) 2 (a-l) 2 

direct computation shows that the minimum is attained in a* = a 2 c 2 /(b 2 c 2 + a 2 cf). The 
resulting identity a 2 = ||s(cr*)|| 2 yields (|4.14|) . 

Insertion of (I4.11j) , (|4.12l) and (|4.13|) in (|4.14p and using the variables T := - j 2 /j 
€ (0,oo], A = a 2 , b 2 = A(l - k)/(1 + «A) with 

_ /ifc - /i; 

results in a representation of a 2 /a 2 as a function of t, A, T and k. (The limit r — > 00 
needs additional care; however this limit corresponds to 7 = 0. For 7 = Thm. 12.21 is 
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already proved in [IB].) The details are as follows. With 

A = yjl + a 2 + Pi (fij -fx) + Ta Q l3 Q (^ k - fil), 



B = ^jl + al + f3l a (ti - + r^oOfj - w). 

C = ^1 + al + Pi /3 (m - m) + Ta (fj, k - fij) 

it holds that c k = A/B and q — A/C. Instead of considering a 2 /a 2 it is more convenient 
to estimate its reciprocal from below. From (|4.14l) one gets 

A(l-«) f<A 2 /if 2 



a 2 1 + kA\^4/ V A 

with 



A Vl + a2+/3 2 6 2 +ra /3o^f ^ + «o + $ « 2 + r «oA)^f 

In these formula the ratios of eigenvalue differences are to be expressed in terms of A and 
K. Therefore let U := fij — /i, V := fi — //& and := /i — (Xl so that fi k — [M = W — V, 
fXj-fii = U + W and /ifc-Mj = -E/-V. Since A = E//V and A(l - k)/(1 + kA) = U/W 
we get that 

Mfc - Mj = V )= («-!)(! + A) 

//-/// Tf 1 £/ j 1 + kA 

Mfc~Mi = 1 _ V U_ = «(1 + A) 

H - hi U W 1 + kA ' 

= T7( 1 + T7) = 



Therefore we have 
a 2 



fi — fik y V U 1 — K ' 

Mfc~Mi = V^-^ = WEf _ 1 = «(! + A) 
M - M* ^ t 7 ^ 1 - « ' 

A(l - «) / x/j + g + 01 go + Tap (K ' 1 1 ] ( K 1 + A) 



(l+A) 



2 



Vv/l + ag+^A + raoA)^^; ' 
Insertion of (|4~T3")l yields / := /(A, t, «, T) with 

/ = % =( (i + A)(r 2 (i - K ) 2 + «(i - «) + r 2 t 2 ) + (i - K ) 2 + t 2 (i - k) 

a 1 V 

+ 2kR v /1/( 1 + ^W 1 + t 2 + kA\/1^\/iTa) / 
- «Vl + t 2 + kA + KTt y / l/(l + t 2 )Vl + A 
This function is monotone increasing in A since df /dA equals 

T 2 yA~ k((1 - k) 3 + 3(1 - nft 2 + 3(1 - K)t A + i 6 ) 



> 0. 



(i + t 2 )Vi + t 2 + kA[ y/T~^Vi + 1 2 + kA + Krt^/i/(i + t 2 ) v / TTA 
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Therefore f(0,t, k,T) is a lower bound for a 2 /a 2 which reads 

(1 + t 2 )(r 2 (i - k) 2 + (1 + t 2 )(l -k) + T 2 t 2 + 2 K TtVT^\ 



f(0,t,K,T) 



{^fl^{l + t 2 ) + nTt) 2 



The parameter t determines the choice of x in the level set C(fi). The derivative with 
respect to t reads 

B 2 K r 2 (i- n + t 2 )(rt 2 + 2iVT — ^-r(i - , 



(VT~k(1 + t 2 ) + nFtJ 



The two real zeros of this derivative are 



yi - k(-1±vT 

H,2 — ^ 



The global minimum is taken in 



yT^(-i + VT+T^) VT~^(1 - 7 ) 
0<<1 = f = ^ ■ 

Therefore the minimum is given by 

' (2- k) + 7 k 
.« + 7(2-«) 

and its inverse yields the desired convergence estimate 



/(o,ti,/c,r) 



A(m') < faV < f K + 7 (2- K ) 



A(/i) ~ \a J ~\(2-k)+"/k 

This estimate is sharp since for A = the right inequality turns into an identity. Further 
A = implies fi(x) — > fij and also /z(x') — > /ij so that hm^M^^. a/b — a' /b' = and in 
this limit C(fi') n £j and E coincide; this implies that the left inequality also turns into 
an identity. □ 

Proof, [of Theorem 12.21 and Theorem ll.2| Let \i = fi(x) € (Mi+i> Mi)- Theorem 13.11 
proves that the poorest convergence is attained in a three-dimensional invariant subspace. 
Theorem 14.31 proves in spanje^, e/-, e;} that 

A i>fc (/i') < ^ + 7(2-«) x2 



It either holds that /xz < Mi+i < mO*-) < Mi — Mfc < Mj or that [i\ < fik < < fi(x) < 
Hi < Hi- I n the first case the Ritz value /x(x') in spanjej, e&, e;} satisfies that /Zfc < £t(x'), 
which is the first alternative in Thm. 12.21 To analyze the second case we get that the 
convergence factor is a monotone increasing function in k 6 (0, 1) since 

dK + 1 (2~n)_ 2(l- 7 2 ) >Q 



8k (2 — k) + 7« (2 — k) + 7 « 

Further k — (/ifc — Hi)/(Hj — Hi) is a monotone decreasing function in /ij and Hi an d a 
monotone increasing function in fik- Hence the poorest convergence with the maximal 
convergence factor is attained in j = i, k = i + 1 and I = n which proves Thm. 12.21 

Ai,i+i(y) < / K + 7(2 - k) \ 2 ^ _ - /x n 



Aj )i+ i(/i) V(2-k)+7k/ Hi~Hn 
Thm. H~2l follows by inserting the reciprocals of the eigenvalues and Ritz values. □ 
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Conclusions. The new convergence bound given in Theorem l 1 . 2 I completes the efforts 
to find sharp convergence estimates within the hierarchy of preconditioned PINVIT(fe) 
and non-preconditioned INVIT(fc) eigensolvers for the index k = 2; a hierarchy of these 
solvers has been suggested in |13j . Next the results are summarized. All these convergence 
estimates have the common form 

A hl+1 (p(x')) < a 2 A ltl+1 {p(x)) 

with A i)i+X (£) = (£- Xi)/(Xi +1 - 0- 

The convergence factor for the non- preconditioned inverse iteration INVIT(l) proce- 
dure is (see [13]) 

(t(INVIT(1)) a ' 



The associated preconditioned scheme, i.e. the preconditioned inverse iteration PIN- 
VIT(l) or preconditioned gradient iteration, has the convergence factor (see [B]) 

cr(PINVIT(l)) = 7 + (1 - 7)- A 



A 



Further the convergence factor of the non-preconditioned steepest descent iteration IN- 
VIT(2) reads (see [16]) 

.(INVIT(2)) = JL. withK= M^Z^). 

Z — K Ai + i (A n — Ai ) 

The new result on PINVIT(2), which is the preconditioned steepest descent iteration, is 
now 

.(PINVIT(2))= ; 2 + ^ 2 -^ with^ ^ A "- A ^; . 

(2-KJ+7K Ai+i (An - Ai) 

All these convergence factors are sharp. 

Further progress in deriving convergence estimates for the hierarchy of non-precondi- 
tioned and preconditioned iteration is a matter of future work. Especially for the practi- 
cally important locally optimal preconditioned conjugate gradient (LOPCG) iteration [3] 
sharp convergence estimates are highly desired. 
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